合成伪样品当前是解决广义零局学习(GZSL)问题的最有效方法。大多数模型都达到了竞争性能,但仍然遇到两个问题:(1)功能令人困惑,整体表示混淆了与任务相关和与任务无关的功能,并且现有模型以生成的方式将它们分解,但是它们是不合理的,无法合成可靠的可靠伪样品样本样品有限; (2)分布不确定性,当现有模型合成不确定分布的样本时,需要大量数据,这在有限的可见类样品中导致性能差。在本文中,我们提出了一个非生成模型,以在两个模块中相应地解决这些问题:(1)与任务相关的功能分离,将任务相关的功能从任务无关的功能中排除,通过对域的对抗性学习域对合理合成的适应性; (2)可控的伪样品合成,以合成具有某些特征的边缘伪钉和中心假样品,以产生更多的多样性和直观的传递。此外,为了描述在培训过程中看到的限制类样本的新场景,我们进一步制定了一个新的ZSL任务,名为“几乎看不见的类别和零射门的唯一类别学习”(FSZU)(FSZU)。对四个基准测试的广泛实验验证了所提出的方法在GZSL和FSZU任务中具有竞争力。
translated by 谷歌翻译
Geometric rectification of images of distorted documents finds wide applications in document digitization and Optical Character Recognition (OCR). Although smoothly curved deformations have been widely investigated by many works, the most challenging distortions, e.g. complex creases and large foldings, have not been studied in particular. The performance of existing approaches, when applied to largely creased or folded documents, is far from satisfying, leaving substantial room for improvement. To tackle this task, knowledge about document rectification should be incorporated into the computation, among which the developability of 3D document models and particular textural features in the images, such as straight lines, are the most essential ones. For this purpose, we propose a general framework of document image rectification in which a computational isometric mapping model is utilized for expressing a 3D document model and its flattening in the plane. Based on this framework, both model developability and textural features are considered in the computation. The experiments and comparisons to the state-of-the-art approaches demonstrated the effectiveness and outstanding performance of the proposed method. Our method is also flexible in that the rectification results can be enhanced by any other methods that extract high-quality feature lines in the images.
translated by 谷歌翻译
机器学习和非接触传感器的进步使您能够在医疗保健环境中理解复杂的人类行为。特别是,已经引入了几种深度学习系统,以实现对自闭症谱系障碍(ASD)等神经发展状况的全面分析。这种情况会影响儿童的早期发育阶段,并且诊断完全依赖于观察孩子的行为和检测行为提示。但是,诊断过程是耗时的,因为它需要长期的行为观察以及专家的稀缺性。我们展示了基于区域的计算机视觉系统的效果,以帮助临床医生和父母分析孩子的行为。为此,我们采用并增强了一个数据集,用于使用在不受控制的环境中捕获的儿童的视频来分析自闭症相关的动作(例如,在各种环境中使用消费级摄像机收集的视频)。通过检测视频中的目标儿童以减少背景噪声的影响,可以预处理数据。在时间卷积模型的有效性的推动下,我们提出了能够从视频帧中提取动作功能并通过分析视频中的框架之间的关系来从视频帧中提取动作功能并分类与自闭症相关的行为。通过对功能提取和学习策略的广泛评估,我们证明了通过膨胀的3D Convnet和多阶段的时间卷积网络实现最佳性能,达到了0.83加权的F1得分,以分类三种自闭症相关的动作,超越表现优于表现现有方法。我们还通过在同一系统中采用ESNET主链来提出一个轻重量解决方案,实现0.71加权F1得分的竞争结果,并在嵌入式系统上实现潜在的部署。
translated by 谷歌翻译
与传统方法相比,学到的图像压缩已在PSNR和MS-SSIM中取得了非凡的速率延伸性能。但是,它遭受了密集的计算,这对于现实世界的应用是无法忍受的,目前导致其工业应用有限。在本文中,我们将神经体系结构搜索(NAS)介绍到具有较低延迟的更有效网络,并利用量化以加速推理过程。同时,已经为提高效率而做出了工程努力。使用PSNR和MS-SSIM的混合损失以更好的视觉质量进行了优化,我们获得的MSSIM比JPEG,JPEG XL和AVIF在所有比特率上都高得多,而JPEG XL和AVIF之间的PSNR则获得了PSNR。与JPEG-Turbo相比,我们的LIC的软件实施实现了可比较甚至更快的推理速度,而多次比JPEG XL和AVIF快。此外,我们的LIC实施达到了145 fps的惊人吞吐量,用于编码为208 fps,用于在Tesla T4 GPU上解码1080p图像。在CPU上,我们实施的延迟与JPEG XL相当。
translated by 谷歌翻译
多模式机器翻译(MMT)通过引入视觉信息来提高翻译质量。但是,现有的MMT模型忽略了图像将带来与文本无关的信息的问题,从而引起了大量噪音并影响翻译质量。本文提出了一种用于多模式机器翻译的新型Gumbel注意事项,它选择了图像特征的文本相关部分。具体而言,与以前的基于注意的方法不同,我们首先使用可区分的方法选择图像信息并自动删除图像功能的无用部分。实验证明我们的方法保留了与文本相关的图像特征,其余部分帮助MMT模型生成更好的翻译。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译